Customer Churn Problem

Problem Background

Customer churn is a problem that all companies need to monitor, especially those that depend on subscription-based revenue streams. Customer churn refers to the situation when a customer ends their relationship with a company, and it’s a costly problem. Customers are the fuel that powers a business. Loss of customers impacts sales. Further, it’s much more difficult and costly to gain new customers than it is to retain existing customers. As a result, organizations need to focus on reducing customer churn.

The dataset used for this Keras tutorial is IBM Watson Telco Dataset. According to IBM, the business challenge is:

“A telecommunications company [Telco] is concerned about the number of customers leaving their landline business for cable competitors. They need to understand who is leaving. Imagine that you’re an analyst at this company and you have to find out who is leaving and why.”

We are going to use Keras libraryto develop a sophisticated and highly accurate deep learning model in Python. We walk you through the preprocessing steps, investing time into how to format the data for Keras.

Finally we show you how to get black box (NN) insights.

Read Data

The dataset includes information about:

Prune and clean dataset

Split data

Preprocess/Normalize the Data

Build the NN model

Finally, Deep Learning with Keras in Python!

The first step is to initialize a sequential model, which is the beginning of our Keras model. The sequential model is composed of a linear stack (sequence) of layers.

Note: The first layer needs to have the input_shape, that is the number of features that is getting fed by. In this case it is the number of columns.

High validation accuracy. Once validation accuracy curve begins to flatten or decrease, it’s time to stop training.

Let’s make some predictions from our keras model on the test data set, which was unseen during modeling.

Test Loss and Test Accuracy

AUC

ROC Area Under the Curve (AUC) measurement

Precision and Recall

Precision is when the model predicts “yes”, how often is it actually “yes”. Recall (also true positive rate) is when the actual value is “yes” how often is the model correct

F1 Score

weighted average between the precision and recal

Inspect Performance of specific samples

Confusion Table